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Abstract 

While  2SLS  is  the  most  widely  used  estimator  for  simuhaneous  equation  models,  OLS 
may  do  better  in  finite  samples.  Here  we  demonstrate  analytically  that  the  for  the  widely 
used  simultaneous  equation  model  with  one  jointly  endogenous  variable  and  valid 
instruments,  2SLS  has  smaller  MSE  error,  up  to  second  order,  than  OLS  unless  the  R  ,  or 
the  F  statistic  of  the  reduced  form  equation  is  extremely  low  We  then  consider  the 
relative  estimators  when  the  instruments  are  invalid,  i  e  the  instruments  are  correlated 
with  the  stochastic  disturbance.  Here,  both  2SLS  and  OLS  are  biased  in  finite  samples 
and  inconsistent  We  investigate  conditions  under  which  the  approximate  finite  sample 
bias  or  the  MSE  of  2SLS  is  smaller  than  the  corresponding  statistics  for  the  OLS 
estimator.  We  again  find  that  2SLS  does  better  than  OLS  under  a  wide  range  of 
conditions.  We  then  present  a  method  of  sensitivity  analysis,  which  calculates  the 
maximal  asymptotic  bias  of  2SLS  under  small  violations  of  the  exclusion  restrictions.  For 
a  given  correlation  between  invalid  instruments  and  the  error  term,  we  derive  the 
maximal  asymptotic  bias.  We  apply  our  results  to  IV  estimation  of  the  returns  to 
education  We  denve  the  bias  in  the  estimated  standard  errors  of  2SLS  for  the  first  time. 
This  derivation  also  has  implications  for  the  test  of  over-identify  ing  restnctions. 
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While  2SLS  is  the  most  widely  used  estimator  for  simultaneous  equation  models, 
OLS  may  do  better  in  finite  samples.  Econometricians  have  recognized  this  possibility, 
and  many  Monte  Carlo  studies  were  undertaken  in  the  early  years  of  econometrics  to 
attempt  to  determine  condition  when  OLS  might  do  better  than  2SLS.  Here  we 
demonstrate  analytically  that  the  for  the  widely  used  simultaneous  equation  mode!  with 
one  jointly  endogenous  variable  and  valid  instruments,  2SLS  has  smaller  MSE  error,  up 
to  second  order,  than  OLS  unless  the  R"  ,  or  the  F  statistic  of  the  reduced  fomi  equation  is 
extremely  low.   We  do  a  calculation  based  on  observable  statistics  with  one  unknown 
parameter  that  allows  a  calculation  that  should  give  valuable  infomiation  about  the 
relative  MSEs  of  OLS  and  2SLS. 

We  then  consider  the  relative  estimators  when  the  instruments  are  invalid,  i.e.  the 
instnmients  are  correlated  with  the  stochastic  disturbance.  Here,  both  2SLS  and  OLS  are 
biased  in  finite  samples  and  inconsistent.  We  investigate  conditions  under  which  the 
approximate  finite  sample  bias  or  the  MSE  of  2SLS  is  smaller  than  the  corresponding 
statistics  for  the  OLS  estimator.  We  again  find  that  2SLS  does  better  than  OLS  under  a 
wide  range  of  conditions,  which  we  characterize  as  functions  of  observable  statistics  and 
one  unobser\'able  stafistic. 

We  then  present  a  method  of  sensitivity  analysis,  which  calculates  the  maximal 
asymptotic  bias  of  2SLS  under  small  violations  of  the  exclusion  restrictions.  For  a  given 
correlation  between  invalid  instruments  and  the  error  term,  we  derive  the  maximal 
asymptotic  bias.  We  demonstrate  how  such  maximal  asymptotic  bias  can  be  estimated  in 
practice. 

Next,  we  turn  to  inference.   In  the  "weak  instruments"  situation  the  bias  in  the 
2SLS  estimator  creates  a  problem,  since  it  is  biased  towards  the  OLS  estimator,  which  is 
also  biased.  The  other  problem  that  arises  is  that  the  estimated  standard  errors  of  the 
2SLS  estimator  are  often  much  too  small  to  signal  the  problem  of  imprecise  estimates. 
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Here  we  derive  the  bias  in  the  estimated  standard  errors  for  the  first  time,  which  turns  out 
to  cause  the  problem.  This  derivation  also  has  implications  for  the  test  of  over- 
identifying  restrictions. 

We  do  not  survey  the  weak  instruments  literature.  For  recent  surveys  see  Stock 
et.  al.  (2002)  and  Hahn  and  Hausman  (2003). 


I  Model  Specification 

We  begin  with  the  model  specification  with  one  right  hand  side  (RHS)  jointly 
endogenous  variable  so  that  the  left  hand  side  (LHS)  variable  depends  only  on  the  single 
jointly  endogenous  RHS  variable.  This  model  specification  accounts  for  other  RHS 
predetermined  (or  exogenous)  variables,  which  have  been  "partialled  out"  of  the 
specification.  We  will  assume  that' 


(1.1)  y,=l3y,+e, 

(1.2)  yj  =27:-,  +V2, 


where  dim(;r2  )  =  K  .  Thus,  the  matrix  z  is  the  matrix  of  all  predetermined  variables,  and 
equation  (1.1)  is  the  reduced  form  equation  for  y^  with  coefficient  vector  71^.  We  also 
assume  homoscedasticity: 


(1.3) 


fr    \ 


v^2,y 


n{q,y)~n 


(J. 


(J.. 


We  use  the  following  notation: 


y 


h^ 


\.yn) 


f  ^'\ 


,     o-,,  =  Var(f „  ),  a^  =  Var{v,,),     cr^  =  Cov(£-„ ,  v,, )  =  a, 


\^nj 


We  initially  assume  the  presence  of  valid  instruments,  E[z'e/  Ji]  =  0  and  71:2  ^0 . 


'  Without  loss  of  generality  we  normalize  the  data  such  that  yj  has  zero  mean. 


II  Estimation  with  Valid  Instruments 

From  previous  papers,  e.g.  Hahn  and  Hausman  (2002a,  2002b)  we  know  the  bias 
and  MSE  of  2SLS  up  to  second  order.  The  bias  of  2SLS  is 

(2.1)    eKJ-P^^^^^     f^-      , 

where  ©  =  n' z'zTTJn  ,  assumed  to  be  fixed,  R  is  the  theoretical  value  from  the  second 
(reduced  form)  equation,  andyj  is  normalized  to  have  mean  zero.  We  assume: 
Condition  1:  K  ^>  ^  as  n  ^  o<,  such  that  K  /  yfn  =  /J  +  o(l)  for  somefJ.  i^  0. 

A        Properties  of  the  2SLS  Estimator 

As  a  special  case  of  Theorem  3  in  Section  3,  we  obtain  that: 


Theorem  1:  V"(&2.sz,5  - P)^>  N 


'<MLv     ' 


Here,  V-^^^^  =  o^jQ,  the  usual  2SLS  first  order  asymptotic  variance.  As  a  consequence, 
we  obtain  the  approximate  MSE  of  2SLS: 


(2.2)      MSE{2SLS)  =  M,  =  ^^  +  ^  =     f"^'""  ,  +  ■      ^'' 


ne'        n      R\y\y,f     R'(y',y,) 
Note  that  both  terms  in  equation  (2.2)  approach  zero  as  {yj' y-. )  increases  with  increasing 
sample  size.  The  first  term,  bias  squared  also  approach  zero  more  quickly,  as  expected, 
since  2SLS  is  "root  n"  consistent. 

We  now  simplify  the  M2  expression  for  2SLS.  Without  loss  of  generality  we  use  the 
normalization  (rescaling  of  units)  O"^^  =  a^^.  =  1  so  that  Var{y^)  =  \/{l-R^)  and 
(J^,  =  p  .    Using  this  normalization  we  find: 

(K'p'(\-R')+nR'-)(\-R'^ 


(2.3)      MSE{2SLS)  =  M,= 


n'R'  R' 


The  convergence  of  MSE  to  zero  in  terms  of  the  sample  size  n  becomes  quite  evident 
with  this  normalization. 


These  parameter  are  theoretical  values  from  the  underlying  model  specifications  for  given  parameter 
values. 


B        Properties  of  the  OLS  Estimator 

We  now  calculate  the  bias  and  MSE  of  the  OLS  estimator.  The  approximate  bias 
is: 


(2.4)     eKJ~P~~^^^'^ 


.Var(yj)        G  +  <J, 


The  approximate  variance  is  defined  as: 


.2      /2v2 


(2.5)      V,,,=-^ Oj^^l^^ 


As  a  special  case  of  Theorem  4  in  Section  3,  we  obtain  the  distribution  for  the  OLS 
estimator: 


Theorem  2:  V« 


'      ',.  '^    '^ 


OLS 

V  V 


0  +  cr, 


^l^i^JoLs) 


w   J  J 


This  result  is  the  same  as  in  Hausman  (1978).  Thus,  the  approximate  MSE  of  OLS  is 


(2.6)      MSE{OLS)  =  M, 


a 


V, 


OLS 


(e  +  CT„,)^         n 


The  inconsistency  of  OLS  is  evident  from  equation  (2.6)  because  while  the  second  term 
goes  to  zero  as  n  becomes  large,  the  first  term  is  not  a  function  of  «.  The  OLS  MSE 
under  the  normalization  used  above  becomes: 


M„ 


(7    ev  (T, 


2       d4 


2(j\R 


(2.7) 


var^(>'2)      "var(>'2)      «var^(>'2)     '^'^^^^(y'l) 


^/?'(77-l-27?')(l-/?')  +  0 


We  see  that  the  first  term  in  the  OLS  MSE  is  the  usual  first  order  bias  term  squared.  It 
does  not  go  to  zero  as  n  becomes  large  since  OLS  is  inconsistent. 

From  the  2SLS  MSE  calculations  as  the  sample  size  grows  large  the  denominator 
of  the  2SLS  MSE  calculation  in  equation  (2.3)  dominates,  and  the  MSE  goes  to  zero.  To 
the  contrary  for  the  OLS  MSE,  in  equation  (2.7)  the  numerator  also  grows  with  ;;,  as  the 
bias  of  the  OLS  estimator  does  not  go  to  zero  with  the  sample  size.  Thus,  for  large 
samples  2SLS  is  consistent  and  OLS  is  not.  We  now  consider  how  the  estimators  do  in 
finite  samples. 

C        Bias  Comparisons  of  the  2SLS  and  OLS  Estimators 

We  compare  the  approximate  finite  sample  bias  of  2SLS  to  the  approximate  MSE  of 

OLS: 


(2.8)     ^=    '^  '       ■ 


B^      nR''       \-R-  F 

where  the  F  statistic  is  the  "theoretical"  F-statistic  from  the  first-stage  reduced  form. 
Thus,  if  F  »  1 ,  2SLS  has  less  bias.   However  the  OLS  variance  is  less  than  the  2SLS 
variance  so  v^'e  compare  the  MSEs  below. 

Before  leaving  the  bias  comparisons,  we  also  consider  what  happens  when  we  are 

close  to  being  unidentified  so  that  tTj  -  aj  ^n  .  where  the  vector  has  dimension  K.  Thus, 

the  reduced  fomi  coefficients  are  "local  to  zero".  With  ;r,  =  cr/v'J  ,  equation  (2.1) 
predicts  the  bias  of  2SLS  to  be 

(2.9)      £[6,,,J-/?  = 


SLS  J        r-  ^  ■ 

K 
where  equation  (2.9)  is  an  approximation  to  the  asymptotic  bias  of  2SLS  under  the 

asymptotics  where  tt^  =  aj^n  .  Here,  Y  =  a' z' za  .  On  the  other  hand,  equation  (2.4) 

predicts  the  approximate  bias  for  OLS  to  be: 


(2.10)    E[baJ-P-       "" 


n 


Taking  the  ratio  of  the  biases  under  local  to  zero  asymptotics: 
(2.11)    ^"  -" 


K' 
From  equation  (2.1 1),  it  follows  that  the  bias  of  2SLS  is  smaller  than  OLS  as  long  as 
K  «  n  ,a  condition  which  will  always  be  satisfied  in  practice. 

D        MSE  Comparisons  of  the  2SLS  and  OLS  Estimators 

We  next  compare  the  MSE  of  2SLS  to  the  MSE  of  OLS  using  the  normalization 
(and  non-local  asymptotics): 


(2.12)    ^^^-  K'p'(\-R')  +  nR' 


M,      {nR')[{n-\-2R')p\\-R')  +  \] 


where  7?"  =  (7?"  )^ .  The  correlation  parameter  p  is  the  key  parameter  in  simultaneous 
equation  analysis  because  if  it  is  zero  the  OLS  estimator  is  the  unbiased  Gauss-Markov 
estimator  and  the  ratio  of  MSEs  in  equation  (2.12)  equals  1/7?^  >  1,  but  OLS  is  biased 
and  inconsistent  if  the  parameter  value  of  p  is  not  zero. 

Which  estimator  to  use  will  depend  on  whether  equation  (2.12)  is  less  than  or 
greater  than  unity.  We  can  solve  for  the  "critical  value"  of  p^  which  causes  the  MSE  of 
the  2  estimators  to  be  equal.    The  solution  for  this  "critical  value"  has  a  remarkably 
simple  form: 


(2.13)    p' 


nR' 


nR\n-\-2R')-K- 


As  «  becomes  large  the  "critical  value"  of /7^  goes  to  zero.  In  any  particular  sample  i?"^ 
and  F  can  typically  be  accurately  estimated  from  the  unbiased  estimates  of  the  reduced 
form  so  that  only  p^  is  unknown.  While  this  parameter  value  is  typically  unknown,  the 
applied  econometrician  will  often  have  a  good  (a  priori)  knowledge  of  p  so  that  she  will 
be  able  to  determine  whether  the  critical  value  is  below  the  square  of  the  correlation 
coefficient.^'  As  we  now  demonstrate,  the  critical  value  is  often  so  low  that  2SLS  will 
have  a  lower  MSE  than  OLS,  even  for  situation  with  relatively  "weak  instruments"  or  a 
low  F  statistic. 

In  Figure  1  we  calculate  the  critical  value  of  p  (using  the  absolute  value)  for  a 
range  of  values  of  R^  for  K  of  5,  10,  and  30  and  for  sample  sizes  of  n  =  100.^     The 
results  of  Figure  1  demonstrate  that  for  K=5  if  R"  >  0.1  then  the  critical  value  of  p  is 
sufficiently  small  that  2SLS  should  typically  be  used  in  terms  of  the  MSE  comparison. 
For  K=10  2SLS  will  typically  be  belter  if  R'^  >  0.2  .  However,  for  K=30  we  typically 
require  R"  >  0.4  .  In  Table  1  we  repeat  the  calculations  for  n=500  and  n=1000.  Here  we 
find  that  if  R'  >  0.1  that  2SLS  typically  will  have  a  lower  MSE.  Thus,  except  in  the  case 
of  weak  instalments,  which  can  arise  when  both  R   is  low  and  the  number  of  instruments 
is  high,  2SLS  is  typically  the  preferred  estimator  based  on  an  approximate  finite  sample 
comparison  of  MSEs. 

Ill         Estimation  with  Invalid  Instruments 

Up  to  this  point  we  have  assumed  that  the  instniments  are  valid  so  that  they  are 
orthogonal  to  the  stochastic  disturbance  f, .  However,  the  econometrician  may  not  be 
certain  that  the  instruments  satisfy  the  orthogonality  condition.  We  now  consider  the 
situation  where  the  orthogonality  condition  on  the  instruments  fails  so  that 
£[z'f|  /  n]  ^  0.  We  first  consider  the  "large  sample  bias"  of  2SLS: 


■  The  parameter  p  is  also  estimated  from  the  2SLS  estimation,  but  a  good  estimate  may  be  diffcult  to 
achieve  in  a  "weak  instrument"  situation 
The  curves  for  increasing  K  lie  to  the  right  of  each  other. 


(3.1)       plim[6,,„]-y9  =  -^5^ 
where  W  =  ztTj  .  When  we  compare  this  with  the  analogous  expression  for  OLS 


(3.2)      plimlVsJ-^^-^^ 


a.. 


yiyi 


In  general  either  estimator  may  be  preferred  on  this  criterion  depending  on 
circumstances.  The  numerator  of  equation  (3.1)  would  likely  be  smaller  ("less 
correlation"  in  the  instrument)  than  the  numerator  of  equation  (3.2),  but  the  denominator 
of  equation  (3.1)  is  always  smaller  since  R^  <  1.  Indeed,  if  R^  is  very  small,  the  OLS 
estimator  may  do  better  in  tenns  of  inconsistency. 

A        Invalid  Instrument  Specification 

To  do  asymptotic  approximations  we  need  to  specify  the  correlation  of  the 
instrument  with  the  stochastic  disturbance  in  the  structural  equation  (1.1).  We  use  a  local 
specification  similar  to  the  approach  in  Hausman  (1978,  Theorem  2.1): 

(3.3)      £,=z{yl4n)  +  efory^0. 

We  assume  that  (e,v)  is  homoscedastic  and  zero  mean  normally  distributed  with 
covariance  matrix  : 


'e,,^ 


V^2,V 


n{o,q.)-~n 


f^ll         CS-12 
^12        <^7^ 


-"-'jy 


B        Properties  of  the  2SLS  Estimator  with  Invalid  Instruments 

We  derive  the  asymptotic  distribution  of  the  2SLS  estimator  with  locally  invalid 
instruments  in  Appendix  A: 


Theorem  3:  V"  (^.^^^  -fi)=>N 


e 


V 

1  ^  2SLS 


^N 


-Jna^:^  +[kI  -Inp^ 


R'a.. 


11  V 

'      2 


2SiS 


where  J^F  =  z;r2  is  the  instrument  and  ^  =  ;r'z'z7/;2,  which  is  assumed  to  be  fixed  .  The 
first  term  in  the  numerator  of  the  mean  E  arises  fi"om  failure  of  the  orthogonality 
condition.  The  second  term  is  the  usual  finite  sample  bias  term  and  it  decreases  with  the 
sample  size.  The  variance  continues  to  be  f^25is  under  instrument  invalidity  because  of 
the  local  departure  in  equation  (3.3)  similar  to  Hausman  (1978,  p.  1256). 

We  use  Theorem  3  to  calculate  the  approximate  bias  of  the  2SLS  estimator  with 
invalid  instruments  is: 


=.l4n  +  K(7,Jn      \-R^(   1  1    ,   ^ 

©  R       \yln  n        J 

where  we  use  the  previous  nonnalizations  and  set  o^y.^  =  H/ V«  =  apf  4n    for  a  <\ 
Using  Theorem  3  we  find  the  MSE  of  2SLS  to  be: 


M5£,,=^^l^^i+''"- 


(3.5) 


0^ 


n         n 


(.        n2^V 


/?' 


—j=  ap  +  -^  Kp 


+  — 
n 


^\-R'^ 


C        Distribution  of  the  OLS  Estimator  with  Invalid  Instruments 

We  derive  the  asymptoUc  distribufion  of  the  OLS  estimator  with  locally  invalid 
instruments  in  Appendix  A: 


Theorem  4:  V" 


OLS 

V  V 


/».  ''" 


0  +  (J,, 


TV 


0  +  C7., 


V 


J 


V"<7„,.^ 


V, 


V      ^.o 


OLS 


The  distribution  is  centered  around  the  usual  OLS  bias,  as  before,  and  the  numerator  of 
the  mean  of  the  distribution  arises  from  the  instrument  invalidity.  Again,  the  variance 


continues  to  be  F^^^  under  instrument  invalidity  because  of  the  local  departure  in 
equation  (3.3).  Using  Theorem  4  we  find  the  MSE  of  OLS  to  be: 


MSEo,  = 


(3.6) 


(Tr 


1      ^+/^g"l2 


■  + 


2\2 


=  (l-/?0 


rn 


ap  +  p 


^'     ^-p'(l  +  27?')(l-^')  +  l^ 


(i-/i') 


The  first  term  in  parentheses  is  the  "usual"  simultaneous  equation  bias  of  OLS  that  does 
not  decrease  with  the  sample  size. 

We  consider  a  special  situation  which  make  the  fonnulae  easier  to  interpret.  Let 
y  =  T7t  for  some  r  .  Under  this  proportionality  assumption,  the  asymptotic  distributions 
take  the  form: 


2SLS 


and 


0  +  cr„ 


r 


N 


V 

'  '^  OLS 


^(^R\v,,,) 


l  +  cr,2/0 

where  we  have  used  the  normalization  to  derive  the  final  expression  for  the  distribution 
of  OLS. 


D       Bias  Comparison  of  2SLS  with  OLS 

We  now  compare  the  bias  of  2SLS  under  instrument  invalidity  with  the  bias  of 
OLS  given  similar  circumstances.  We  now  re-write  the  bias  of  OLS  using  the 
normalization: 


(3.7)      B^,=E[b,J-/3^{\-R^) 


^r 


ap  +  p 


As  before,  we  take  the  ratio  of  (3.4)  and  (3.7): 
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(3  8)     ^  =  ^P'^  +  ^P'" 

The  ratio  of  the  biases  is  homogeneous  of  degree  zero  in  the  correlation  coefficient  p ,  so 
we  can  simplify  terms.  We  plot  the  ratio  of  the  biases  in  Figure  2  for  the  case  of  n=100 
and  K=5  and  «  =  0. 1 . 

We  find  that  the  2SLS  bias  is  less  than  the  OLS  bias  if: 

..  r.s  1    nR^-K 

Equation  (3.9)  is  very  easy  to  interpret.  We  calculate  a  "critical  alpha"  in  Figure  3,  and 
note  that  it  increase  quite  rapidly,  so  that  the  bias  of  2SLS  with  invalid  instniments 
remains  less  than  the  bias  of  OLS  so  long  as  F  exceeds  1 .0  by  a  small  amount.  The 
straightforward  relationship  of  equation  (3.9)  allows  for  an  easy  interpretation  on  which 
the  econometrician  may  well  have  some  a  priori  knowledge. 

In  certain  situations  it  may  be  reasonable  to  consider  a  relationship  between  o^^,.^ 
and  R^  such  that  when  the  covariance  is  less  so  is  R'.   If  we  totally  differentiate  equation 

(3.9),  we  fmd  that  clR'  I da-{n-K)l\  n''\a  +  ^n)     .  Thus,  for  a  given  increase  in  the 

covariance  between  the  instnimcnt  and  the  stochastic  term,  a ,  we  fmd  that  the  required 
increase  in  R^  is  approximate  at  the  rate  of  the  one  over  the  square  root  of  n  to  keep  the 
ratio  of  the  biases  approximately  the  same.  However,  the  required  change  in  R  is  also 
inversely  related  to  a  . 

Note  that  the  common  empirical  finding  that  the  2SLS  coefficient  is  larger  than 
the  OLS  coefficient  can  arise  because  of  the  OLS  bias  when  the  instruments  are  valid  or 
because  of  an  improper  instrument.  Thus,  even  if  the  instrument  is  "almost  uncorrelated" 
so  that  cr„..^  =  0  substantial  bias  can  still  arise  because  R'     is  often  quite  small  in  the 

weak  instruments  situation.  Thus,  comparing  equation  (3.4)  to  the  bias  of  OLS  in 
equation  (3.7),  the  empirical  finding  that  the  2SLS  estimate  increases  compared  to  the 
OLS   estimate   may  indicate  that   the   instrument   is  not   orthogonal   to  the   stochastic 
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disturbance.  The  resulting  bias  can  be  substantial.  Indeed,  it  could  exceed  the  OLS  bias, 
leading  to  an  increase  in  the  estimated  2SLS  coefficient  over  the  estimated  OLS 
coefficient. 

E        MSE  Comparison  of  2SLS  and  OLS  with  Invalid  Instruments 

Returning  to  the  general  situation  and  using  the  normalizations  the  ratio  of  the 
MSEs  is 

M,  (\-R'iap  +  Kp/^J/R'+\/R' 

Mo      il-R')[{ap  +  /^py  +\-{l-R')p'  ~2i\-R')p'R']' 

No  straightforward  condition  can  be  derived  where  the  ratio  is  less  than  one.  We  graph 
the  ratio  of  the  MSEs  for  «  =  0.1  and  K=5,  n=100  in  Figure  4  (please  note  the  inverted 
vertical  axis).  Note  that  the  ratio  of  MSEs  is  below  1 .0  except  in  the  situation  where  R^ 
becomes  quite  small  (as  with  weak  instruments)  and  p  becomes  small  (which  decreases 
the  OLS  bias).  The  situation  remains  essentially  the  same  when  we  increase  to  a  =  0.3 
in  Figure  5.  To  yield  a  better  understanding  of  what  can  happen  in  this  situation,  we  plot 
the  situation  in  Figure  6  where  R^  <  0.2  and  p  <  0.3  for  «  =  0.1 .   Figure  6  demonstrates 
that  the  2SLS  estimator  can  do  quite  poorly  compared  to  the  OLS  estimator,  even  though 
the  F  statistic  exceeds  1 .0  by  a  large  amount.  The  reason  for  this  poor  relative 
performance  is  the  small  size  of  p  which  makes  OLS  a  relatively  good  estimator. 
However,  this  situation  is  typically  not  a  situation  where  the  absolute  performance  of  the 
2SLS  estimator  with  valid  instruments  would  be  poor  under  weak  instruments  because  p 
is  not  large.  It  is  the  presence  of  invalid  instruments,  with  only  a  "small  amount"  of 
correlation  with  the  stochastic  disturbance  that  creates  the  problem. 

F.  Comparison  of  a  (Second  order)  Unbiased  Estimator 

In  our  comparisons  of  2SLS  with  OLS,  two  sources  of  bias  arise.  The  first  source 
of  bias  is  from  the  use  of  estimated  parameters,  Kj  in  equation  (1 .2),  in  forming  the 
instruments.  This  source  of  bias  disappears  as  the  sample  becomes  large.  The  second 
source  of  bias  is  from  the  use  of  invalid  instruments,  y  ^0  in  equation  (3.3).  This  source 
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of  bias  does  not  disappear  sufficiently  fast  with  the  sample  size  to  cause  2SLS  to  be 
consistent.  An  interesting  question  would  be  about  how  the  comparison  of  IV  to  OLS 
would  change  if  the  first  source  of  bias  were  eliminated.  We  can  eliminate  this  source  of 
bias  (to  second  order)  by  using  the  Nagar  estimator. 

We  derive  the  asymptotic  distribution  of  the  Nagar  estimator  with  locally  invalid 
instruments  in  Appendix  A: 


Theorem  5:  4n{b^  -  (3) 


N 


0'  0 


=  A/ 


vni 


cr„ 


We 
2  _         '  •"  2SLS 


R'a 


-,V,s 


where  W  =  zn^  is  the  instrument  and  'E  =  n' z'zy I n,  vv'hich  is  assumed  to  be  fixed,  and 

as  before  V^^^^  =  cr^jQ .  Thus  to  compare  the  MSE  of  the  Nagar  estimator  to  the  MSE 

of  the  2SLS  estimator  with  invalid  instruments,  we  see  that  the  variance  of  the  two 
estimators  is  the  same,  but  that  the  bias  differs  as  explained  above.  However,  when  we 
compare  the  bias  square  of  2SLS  from  equation  (3.4)  with  the  Nagar  estimator  we  find 
that 


(3.1 i; 


=.1 4n  -\-  Ka^2  /" 
0 


H/V^l 


f  ~ 


0 


0^ 


can  be  less  than  or  greater  than  zero.  Thus,  we  cannot  conclude  that  using  the  Nagar 
estimator  to  compare  with  OLS  would  make  the  comparison  more  favorable  to  an  IV 
estimator. 

IV         Sensitivity  Analysis 

Card  (2001)  discusses  possible  concerns  that  the  instruments  may  be  invalid  in 
discussing  the  empirical  literature  that  estimates  the  return  to  additional  education.  The 
use  of  instrumental  variables  in  this  situation  began  with  Griliches  (1977)  well  known 
paper.  To  investigate  the  possibility  of  invalid  instruments,  we  consider  the  specification: 


'  The  Nagar  estimator  may  perform  poorly  with  weak  instnimenis  because  of  its  lack  of  moments.  See 
Hahn,  Hausman,  and  Kuersteiner  (2002). 
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(5.1) 


y,{e)  =  py,+ze  +  £ 


£  =zO  +  e 

Note  that  we  have  added  zd  to  the  error  e .   We  derive  the  maximal  asymptotic  bias  for 
a  small  violation  of  the  exclusion  restriction  in  Appendix  B,  where  y/  is  the  correlation 

between  z^tt  and  £•'  so  that  i//^  is  the  R^  of  between  z-TT  and  e'  .  We  find  the  maximal 

asymptotic  bias  to  be: 


Theorem  6:      max 


plim  bias  y?2as  (^) 


1    plim/7"'^£-; 


2A"V  2      \ 

I//  ^ 


R'  pVimn-'Y^yl 


r 


Note  that  the  maximal  asymptotic  bias  can  be  consistently  estimated  by 


(5.2) 


1  "-'Z^: 


1/2 


/  2       A"2 


\-y/' 


Imbens  (2003)  suggested  a  different  sensitivity  analysis  in  a  program  evaluation 
model  with  binary  explanatory  variable,  extending  Rosenbaum  and  Rubin's  (1983).  With 
some  simplification,  it  can  be  said  that  he  considers  a  parametric  model  where  an  omitted 
variable  bias  is  suspected.  It  is  well  known  that  the  omitted  variable  bias  can  be  related  to 
two  parameters,  the  coefficient  of  the  omitted  variable  and  the  correlation  of  the  omitted 
variable  with  other  observed  variables  in  the  model,  e.g.  Griliches  (1957).  The  sensitivity 
analysis  of  Imbens  (2002)  is  based  on  manipulation  of  these  two  parameters. 

We  now  consider  the  effect  of  invalid  instruments  in  an  empirical  example. 
Estimating  the  return  to  education  has  been  a  well-researched  problem  over  the  past  25 
years.  Griliches  (1977)  is  a  seminal  paper  that  uses  FV  to  estimate  returns  to  schooling. 
The  usual  result  is  that  researchers  find  the  OLS  estimate  to  be  smaller  than  the  2SLS 
estimate  by  approximately  25%-50%,  e.g.  Card  (2001).  This  result  arises  from  a  tradeoff 
between  two  potential  sources  of  bias:  (1)  an  omitted  variable,  call  it  "spunk"  in  the 
stochastic  disturbance  may  be  correlated  with  the  amount  of  educations.  Thus,  people 


Imbens  (2003)  considers  the  question  of  sensitivity  analysis,  but  not  in  the  context  of  instrumental 
variables. 
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with  more  spunk  achieve  higher  education  levels  and  also  higher  earnings,  because  they 
work  harder  both  in  school  and  on  the  job.  This  left  out  variable  would  lead  to  an  upward 
bias  in  the  least  squares  estimate  of  the  schooling  coefficient.  (2)  errors  in  variables  (EFV) 
that  arise  because  years  of  schooling  are  a  noisy  measure  of  "useful  knowledge"  attained 
with  more  years  of  school  that  leads  to  higher  earnings.  Here  the  EFV  would  lead  to  a 
downward  bias  in  the  least  squares  estimate  of  the  schooling  coefficient.  The  typical  IV 
results  finds  that  the  EIV  effect  is  larger  than  the  left  out  variable  effect,  so  the  2SLS 
estimated  typically  exceed  the  OLS  results  by  a  significant  amount. 

We  now  consider  the  Angrist  and  Krueger  (AK)  results  which  have  an  extremely 
large  sample  and  use  quarter  of  birth  to  fonn  instruments,  which  may  be  more  likely  to  be 
orthogonal  to  the  stochastic  disturbance  than  more  widely  used  family  background  and 
other  types  of  instruments  typically  used  in  the  empirical  returns  to  schooling  literature. 
However,  the  AK  instruments  have  an  extremely  low  R"  that  could  help  create  a  weak 
instruments  situation.  Angrist  and  Krueger  (1991)  used  a  sample  of  n  =  329,509 
observations  to  estimate  the  returns  to  education.  Using  the  .AK  data  we  estimate  the 
2SLS  return  to  education  to  be  0.0891  (.016)  using  K=30  after  partialing  out  the  other 
right  hand  side  variables.  This  estimate  is  closer  to  the  OLS  estimate  of  0.071  (.0003) 
than  expected  given  other  empirical  results.  After  partialling  out,  we  find  that  the 
average  squared  residuals  equal  0.41,  the  average  of  the  partialled  out  right  hand  side 
endogenous  variable  (education)  equals  10.8,  and  R^  =  .00044662.  For  \i/~  =  0.0001 ,  we 
find  that  the  solution  to  equation  (5.2)  is  0.0925.  This  maximal  bias  exceeds  the  2SLS 
estimate  of  0.0891,  so  a  small  amount  of  bias  could  either  eliminate  any  estimated  return 
to  education  or  double  the  estimate. 

Our  finding  that  the  returns  to  coefficient  could  be  over  two  times  the  OLS 
estimate  contrasts  with  the  results  of  Manski  and  Pepper  (2000)  who  apply  Manski's 
(1990,  2003)  non-parametric  bounds  approach.  Manski  and  Pepper  (on  a  different 
sample)  find  that  the  upper  bound  is  substantially  less  than  two  times  the  usual  OLS 
estimates  of  the  returns  to  schooling.  Hov^ever,  the  Manski  approach  does  not  allow  for 
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errors  in  variables.  This  omission  may  significantly  limit  the  empirical  relevance  of  the 
Manski  approach  to  this  problem. 

This  result  demonstrates  that  use  of  a  "weak  identification"  strategy  such  as  the 
AK  approach  is  extremely  sensitive  to  very  small  departures  from  the  IV  orthogonality 
assumption.  Note  that  from  the  result  in  Theorem  6,  that  this  extremely  sensitivity  does 
not  decrease  with  increasing  sample  size.  Thus,  the  AK  estimate  of  the  returns  to 
schooling  is  very  sensitive  despite  an  extremely  large  sample  size  of  n  =  329,509.  Our 
results  caution  against  using  a  weak  identification  strategy  that  has  become  widely  used 
in  applied  econometrics. 

V  Bias  in  Estimated  Standard  Errors 

We  have  previously  discussed  the  biased  in  the  2SLS  estimator  in  equation  (2.1) 
and  Theorem  1 .   In  the  "weak  instruments"  situation  this  bias  may  be  quite  large.  A 
further  problem  arises  in  that  the  2SLS  estimator  is  biased  in  the  same  direction  as  the 
OLS  estimator  as  equation  (2.4)  and  Theorem  2  demonstrate.  Thus,  Hausman  (1978) 
specification  type  test  will  be  biased  towards  not  rejecting  the  null  hypothesis  of  lack  of 
orthogonality  between  f,  and  v^  in  equations  (1 .1)  and  (1.2).  However,  another  problem 
has  been  recognized  in  the  weak  instruments  situation.  The  estimated  standard  errors  for 
the  2SLS  estimator  are  downward  biased,  sometimes  leading  to  the  mistaken  inference 
that  the  2SLS  estimate  are  much  more  precise  than  they  actually  are.  From  analysis 
based  on  first  order  asymptotics  the  usual  conclusion  would  be  that  with  "weak 
instruments"  that  the  reported  standard  error  of  the  2SLS  estimator  would  be  sufficiently 
large  to  signal  the  finding  that  so  much  uncertainty  exists  with  the  estimate  that  it  would 
not  be  of  much  use.  However,  researchers  have  found  that,  to  the  contrary,  often  the 
2SLS  estimator  in  the  presence  of  weak  instruments  leads  to  a  reasonably  small  standard 
error.  Thus,  the  researcher  may  be  unaware  of  the  weak  instruments  problem,  although 
Hahn-Hausman  (2002,  2003)  propose  a  test  that  is  useful  in  identifying  when  weak 
instruments  is  causing  a  problem.  The  source  of  the  problem  of  small  reported  standard 


More  generally,  since  the  bias  in  the  OLS  estimate  when  ElV  exists  depends  on  the  variance  of  the 
measurement  error,  or  alternatively  the  R"  of  the  regression,  typically  no  bounds  exist  in  the  ElV  problem 
for  the  estimated  coefficient  unless  some  judgment  is  made  regarding  the  unknowTi  variance.  For  further 
discussion,  see  Hausman  (2001). 
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errors  of  the  2SLS  estimator  has  not  been  discussed  in  the  literature.  Here  we  derive  the 
source  of  the  problem  and  offer  a  possible  approach  to  fixing  it. 

The  variance  of  2SLS  is  derived  in  Theorem  1  and  takes  the  usual  form  of 

^2SLs  -  ^ce^     where  0  =  tt'z'z/t  /  n  is  assumed  to  be  fixed.  Now  0  is  not  difficult  to 
esfimate  since  unbiased  estimated  of  7i  follow  from  OLS  on  equation  (1 .2).  Thus,  the 
downward  bias  in  the  estimated  2SLS  standard  errors  must  arise  from  a  downward  biased 
estimate  of  a^^ .  We  now  derive  the  bias.  The  intuition  follows  from  the  fact  that  2SLS 

is  biased  towards  the  OLS  estimator,  which  minimizes  a^^ .  Thus,  we  find  that  the  bias 
of  the  2SLS  estimator  of  fi  creates  a  bias  in  the  2SLS  estimate  of  a^^ .  We  find  the  bias 
to  be: 

Theorem?;  E[<J  isls  1  =  cr„ -^^ cT  «  + -^-^ 

''      n  e  n  n      e 

Note  that  the  leading  term  in  the  bias  calculation  of  Theorem  5  is  2  times  the  bias  of  the 
2SLS  estimator  from  equation  (2.1).  As  either  the  number  of  instruments  grows  or  the 
covariance  between  the  structural  and  reduced  term  stochastic  disturbances  becomes 
large,  the  bias  in  the  estimation  of  a^^  will  also  become  large.   We  now  apply  the 
normalization  that  we  used  above  to  find: 

E[(J    2SLS    ]  =  1 ; P        +-- ^ 

(5.1)  "  ^^  "  "      ^' 

\[(2K-4)p'-\]{\-R')        I     ^ 

n  R  n 

The  bias  can  be  quite  substantial  as  demonstrated  by  equafion  (5.1).  The  final  term  in 
equation  (4.2)  will  typically  be  small  so  that  it  can  be  ignored.  Equation  (5.1 ) 
demonstrates  that  the  downward  bias  can  be  substantial;  in  Monte-Carlo  results  we  find 
that  for  R    =  .01  and  p  =  0.9  that  the  mean  bias  of  the  2SLS  estimate  of  the  variance 
varies  from  -70%  to  -80%  as  K,  the  number  of  instruments,  increases  from  5  to  30. 
Thus,  we  note  that  the  bias  in  the  estimation  even  when  K  =  5  can  be  quite  large.  This 
finding  explains  the  result  that  when  weak  instruments  are  present,  the  estimated  standard 


'  The  Monte-Carlo  design  is  the  same  as  in  Hahn-Hausman  (2002a). 
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errors  of  2SLS  can  appear  to  be  near  those  of  OLS  and  small  enough  to  allow  the 
researcher  to  make  conclusions  about  the  likely  true  parameter  value.  However,  with 
weak  instruments  these  conclusions  could  be  erroneous  because  of  the  substantial  bias  in 
the  estimated  standard  error  of  the  2SLS  estimator.  Kleibergen  (2002)  also  proposes  an 
alternative  approach  to  modify  inferential  procedures,  but  his  approach  is  based  on  the 
LIML  estimator  rather  than  the  2SLS  estimator.  Hahn-Hausman  (2003)  and  Hahn, 
Hausman,  and  Kuiersteiner  (2003)  discuss  problems  that  may  arise  with  this  approach 
because  of  non-existence  of  moments  of  the  LIML  estimator. 

We  now  consider  the  finding  that  the  often  used  test  of  over  identifying 
restrictions  (OID  test)  rejects  "too  often"  when  weak  instruments  are  present,  i.e.  the 
actual  size  of  the  test  is  considerably  larger  than  the  nominal  size.  See  Hahn-Hausman 
(2002a),  Table  xx  where  the  nominal  size  is  0.05  while  the  actual  size  is  sometimes 
greater  than  0.5.  The  OID  test  can  be  quite  important  since  it  tests  the  economic  theory 
embodied  in  the  model  as  discuss  by  e.g.  Hausman  ( 1 983).   In  the  weak  instrument 
situation  it  may  have  increased  importance  given  the  substantial  bias  in  the  2SLS 
estimator  and  the  large  MSE  that  we  calculation  in  equations  (3.1),  (3.3)  and  (3.4).  From 
Hausman  (1983)  we  write  the  OID  test  as: 


£'  P  £ 
(5.2)      W  =  ^-^-^ 


W  is  distributed  as  chi-square  with  K-1  degrees  of  freedom.  From  equation  (5.2),  we  see 
that  a  downward  biased  of  (7„  can  lead  to  substantial  over-rejection  and  an  upward 
biased  size  of  the  OID  test.  Thus,  correcting  for  this  problem  can  have  an  important 
effect  on  test  results. 

VI  Conclusions 

We  derive  second  order  approximations  for  the  bias  and  MSE  of  2SLS  (and  the 
Nagar  estimator)  with  both  valid  and  invalid  instnnnents.  The  derivation  for  invalid 
instruments  is  new,  to  the  best  of  our  knowledge.  We  find  that  substantial  finite  sample 
bias  can  occur  when  weak  instruments  exist  which  arises  when  the  R''  of  the  reduced 


form  regression  is  low,  the  number  of  instmments  is  high,  or  the  correlation  between  the 
structural  and  reduced  form  stochastic  terms  p  is  high. 

We  then  compare  the  bias  and  MSE  of  2SLS  with  OLS.  The  OLS  estimator  is 
biased  and  inconsistent,  but  its  smaller  variance  may  make  it  preferable  to  2SLS  in  a 
weak  instruments  situation.  We  determine  straightforward  and  easily  checked  conditions 
under  which  2SLS  has  smaller  bias  than  OLS.  These  bias  conditions  carry  over,  in  large, 
part,  to  the  MSE  comparisons  because  changes  in  the  bias  term  are  quite  important  in 
changes  in  the  MSE  term  given  typical  sample  sizes  of  n=100  or  larger.  We  find  that  for 
R^  >  0.1 ,  2SLS  is  generally  the  preferred  estimator.  However,  the  econometrician  can 
use  our  formulae  to  check  the  expected  performance  of  2SLS  and  OLS  in  a  given 
situation  given  some  a  priori  knowledge  about  likely  parameter  values. 

We  also  demonstrate  that  a  substantial  bias  exists  in  the  2SLS  estimator  for  the 
variance  of  the  stochastic  disturbance,  which  lead  to  downward  biased  2SLS  standard 
errors  and  over-rejection  of  the  test  of  over-identifying  restrictions.  We  derive  a  formula 
for  the  bias  that  would  allow  for  correction  of  the  bias. 
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Figure  1 

Critical  Values  for  Rho 

n=100  and  K=5.  10.30 
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Table  1:  Critical  Values  of  p 


R'^2 

0.01 

0.1 

0.2 

0.3 

0.5 

0.7 

0.9 

K=5 

100 

** 

0.3677 

0.2323 

0.1863 

0.1432 

0.1210 

0.1070 

500 

*♦ 

0.1423 

0.1002 

0.0818 

0.0634 

0.0536 

0.0473 

1000 

0.3654 

0.1002 

0.0708 

0.0578 

0.0448 

0.0378 

0.0334 

K=10 

100 

•k-U 

** 

0.2601 

0.1949 

0.1455 

0.1220 

0.1075 

500 

** 

0.1445 

0.1006 

0.0819 

0.0634 

0.0536 

0.0473 

1000 

** 

0.1006 

0.0708 

0.0578 

0,0448 

0.0378 

0.0334 

K=10 

100 

** 

** 

** 

»* 

0.1789 

0.1339 

0.1135 

500 

** 

0.1771 

0.1050 

0.0834 

0.0638 

0.0538 

0.0474 

1000 

** 

0.1049 

0.0716 

0.0581 

0.0448 

0.0379 

00334 

denotes  no  critical  value  of  p  less  than  1.0  exists 
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Figure  2:  Ratio  of  2SLS  Bias  to  OLS  Bias  with  Invalid  Instruments 

N=100,K=5,  a  =  0.1 
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Figure  3 

Critical  Values  for  Alpha 
n=100andK=5,  10.30 
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Figure  4 

MSE  2SLS/MSE  OLS 
Alpha=0.1  andK=5 
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Figure  5 

MSE  2SLS/MSE  OLS 
Alpha=0.3  and  K=5 
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Figure  6 

MSE  2SLS/MSE  OLS 
Alpha=0.1  and  K=5 
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A     Bekker  Asymptotic  Distribution  of  2SLS,  OLS,  and  Nagar 
under  Misspecification 

Suppose  that 


Vu     =     y2iP  +  ei  =  {z[^2)  P  +  uu 


where 


TV     0, 


f^l,!        '^1,2 
'^1,2       '^2,2 


Following  is  the  Lemma  reproduced  from  Hahn  and  Hausman  (2001); 

Lenima  1   Let  U  =  \  y\     y2    \-  Assume  that  ^  ^t  a-ir  o  (n"''/^),  and  that  -k^z'  z-k^Iti  is  fixed  at  0.  Let 
S  =  U'P.U  and  S-L  =  U'M.U.    We  then  have 


n-'^S  22 

n-'S^2 
leJ- 


e+^  -UJ2,2 
\        (l-^)-2.2        J  J 


\  ri-'Si2    I 
where  A  and  A"*"  denote  symmetric  3x3  matrices  such  that 


/ 

'  A 

0 

M 

0, 

V 

0 

A^ 

and 


A]_2  =  2t^i,i0/3  +  W^Q^\,2  +  Sawi, 1(^1,2 

A]  .3  =  4/3eu.'i,2  +  2qwJ2 

A2.2  —  fj-"!,!©  +  /3  01^2, 2  +  20a;i.2/?  +  Qa;j_iu;2,2  +  ct<^\  2 

A2,3  =  2(^2,20/5  +  200;]  ,2  +  2qu;2,2'^1,2 
As.s  =  4cJ2,20  +  2Qa;^_2 


Aj  2  =  2(1-  q)u/'i_ilji,2 
A^,3  =  2(l-Q)a;'[2 


A5 


(1  -  q)cJ],]U,'2,2  4-  (1  -  Q)t 


'1,2 


A^3  =  2(1  -  q)u;2,2I^'i,2 


H,Z 


2(1  -q)u;^_2 


Remark  1    TTie  u)s  in  the  Lemma  correspond  to  the   "reduced  form'".   It  would  be  convenient  to  rewrite 
the  above  with  structural  form  parameters.  Because 


u  =  £  +  I3v 


we  can  see  that 


^1,1 

=     ffi.i  +  2/3ai,2  +  /3V2,2 

t^l,2 

=      cri,2  +  /3l72,2 

t^2,2 

=      0-2,2 

Lemma  2  Suppose  that  -j^  =  /i  +  o(l).   Then  we  have 
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=>       N[Q,V2SLS) 

=>     A' (n,  Vols) 


Proof.  Suppose  that  q  =  0.  Using  the  previous  Lemma,  we  obtain 
n^'S,2  \       f  ©-Z^  +  f -^1,2 


.V    0 


and 


n^'  (S,2  -  ^S^,) 


n-i  (S,2  +  5^2) 
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2  (^^2,2/3  +  ^^'1.2)  0  4u;2.20 


■  J^    0, 


(/3  u-'2.2  +  2aJi_2/?  +  iJ-"],])  0  +  1^1  2  +  '^'1.1  "''2. 2       2  (^,'2.2,/?  +  1^1,2)  ©  +  2u;2.2'^'l, 

4^2  20  +  2u;? 


2  (u;2,2/3  +  '^'1,2)  0  +  2u,'2.2i^'l.2 

Therefore,  using  Delta  method,  we  obtain  tlie  following 
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Because  -^  =/i  +  o(l),  we  can  see  that 
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A.l      Asymptotic  Distribution  of  2SLS  under  Misspecification 

Note  that 

y'2P^y2  y'2Pzy2 
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so  that 


It  follows  that 
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A. 2     Asymptotic  Distribution  of  OLS  under  Misspecification 

Note  that 

y',y,        y'2  {yl  +  7^n) 
"OLS     =     — —  =  ; 

^22/2  2/2^2 

=     ^22/1    ,      1    71'^  (27r2  +  z;2)'a7 
^2^2        Vn  n-Jy^j/2 


But 


so  that 


It  follows  that 


n   ^  {ZTT2+V2)'  zj     =     =.  +  n   ^i4z7  =  ^  + Op  (1) 

"~^2/22/2      =      n"^522  +""^5^2  =  ©  +  (^2,2  +  Op(l) 


n   ^  {zTT2  +  V2)'  z-y  5  ,        ,,. 

+  Op(l) 


'y^P,y2  e  +  a2,2 


0-1,2    \^ 

=        V^l 

fy'2yl 
Ky2y2 

(3  1     '"'■^ 

6+tJ2,2/y' 

\           f  +  W2,2 

=    V^\ 

(y'2y\ 

12/2^2 

(P    1          '"'■' 

\             0  +  UJ2^2 

^   n( 

r\    , 

,VoLs) 

^9  +  (72,2 

A. 3      Asymptotic  Distribution  of  Nagar  under  Misspecification 

Note  that 

,      _    y'2P.yi  -  ^M.yi  _  ^'^^^  (2^'  +  7^^"^')  -  ^^2^.  {y\  +  :j^n 


y^Pzj/2  -  f  y^A/.y2  ^^^.^2  -  ^y'2M.y2 

y'2P^yJ--±y2Mfyi  .    1        n-'(z7r2 +  ^2)'^! 


+ 


But 


so  that 


It  follows  that 


y'2Pzy2  -  ^y'2M.y2      V^  n-i  (y^F.yz  -  ^y^A/.yz) 

n"' (zTTa +  r2)'27     =     H  +  Op(l) 
"^'  ly2P.y2  -  ^y'2M,y2]     =     @  +  Op (1) 

Ti~'  (z7r2  +i^2)'z7         ^  S 


V'n  (6;v  -0)      =      Vn  I  -777- k,.,        -  P  |  +    __w,/^, k3 


^''\y'2P..y2-^y'2M.y2 

(y'2P.y{  -  ^y'2M..y\         \ 

\y'2P.y2-^y'2M^.y2     ^) 


n-'  (y^P.yz  -  7-2/2 A/. y2) 

e 


+  H  +  op(l) 


A^l   Q,V2SLS 


B      Sensitivity  Analysis 


Consider  a  model  with  one  endogenous  regressor  where  other  included  exogenous  variables  are  partialled 
out.  The  model  takes  the  form  where 

Vi  =1^/3 +  e,,     2  =  1,...  ,n. 

Denote  the  available  instrument  as  Zi,  and  write  the  first  stage  regression  as 


Xi   =  z'tT  +  Uj 


(1) 


2SLS  estimator  is  obviously  given  by 

hsLs  =  [x'z  {z'zy'  z'xY  x'z  {z'z)-'  z'y, 

where 


X 


x\ 


Y 


Vi 


Vn 


,    z 


What  is  the  property  of  b  if  the  exclusion  restriction  is  in  fact  violated?  In  order  to  implement  violation 
exclusion  restriction,  we  add  a  little  noise  to  £,,  and  consider  a  new  model 


where 


y*  {0)  =  x,(3  +  z[0  +  e: 


e    =  zM  +  e, 


(2) 


Let 


and 


PlsLsid)^  Ix'ziz'zy'z'xV  x'ziz'zy'z'Y^ie) 


We  would  like  to  examine  the  maximal  asymptotic  bias  \b2sLS  {0)\  for  a  small  violation  of  exclusion 
restriction,  i.e.,  the  violation  such  that  the  correlation  between  z'^O  and  e*  is  some  small  number  jp.  We 
argue  that 


n-'n^^t  1      ^ 


\n-^T.: 


1  -•!/;' 


(3) 


provides  such  measure  of  sensitivity.  Here,  ffi?  denotes  the  R^  in  the  first  stage. 


B.l      Derivation  of  (3) 

It  can  be  shown  that' 

where 
Note  that 


b2SLs{d)  =  {^''^^)    '^''^0 


$  =  plimn-^Z'Z 


^    ,  =  (tt  <P7r)        TT  <P 

do'  ^         ' 

which  is  maximized  when  9  oa  n.  We  therefore  focus  on  the  type  of  violation  such  that  ^  =  ^  ■  tt  for  some 
scalar  (.^  Without  loss  of  generality,  we  will  write 

hsLS  (S)  =  b2SLS  (^  ■  tt)  =  b2SLS  (0 

Note  that  the  population  R^  in  the  regression  of  e*  on  z,  which  is  equal  to  the  square  of  the  correlation 
(^  between  e*  and  z'^tt,  is  equal  to 


0'<i>9  _  ^^  ■  tt'^tt 

e'<^e  ■+  E  [£2]  "  ^2  .  ^,^^  ^  £.  [^2] 


and 

62Sis(0  =  (7r'*^)"''r'$(C-7r)  =  C  (5) 

We  can  solve  (4)  for  ^  ,  and  obtain 


Now,  note  that  the  population  R^  in  the  first  stage  R?  is  equal  to 

7r'$7r 


1.2 


which  can  be  solved  for  tt'^tt  as 


■k'^TT  =  R2   .  £.  ^2.2j  (7") 


^  See  next  ^subsection  for  a  slight  !>■  more  general  proof. 

-Maximization  of  ||b2S/.s  (^)ll     witli  respect  to  8  fixing  8'<Pd  constant   has  the  purpose  of  maximizing  the  asymptotic 
bias  b'2SLS  C)  for  a  fixed  population  R^  in  the  regression  of  f'  on  z.   Because 

ln'-t-el^  <  (7r'4>7r)  •  {e'^e) 

with  equality  when  9  octx.  we  can  say  that  tt  is  the  direction  that  maximizes  the  sensitivity  (inconsistency)  of  b2SLS  for  ^ 
given  amount  of  violation  of  exclusion  restriction. 


Combining  (6)  and  (7),  we  obtain 

E  |x2]  K2  1  -  V; 
or 


e 


2 


lCl  =  l^...s(OI=,/f||^^^  (8) 


We  note  that  (8)  can  be  approximated  by  the  empirical  counterpart 

\i\=\h2SLs[i)\ 


n-'T.un  1      ^ 


.2 


B.2     Digression:  Robustness  of  2SLS 
In  genera],  we  estimate  /?  by 

^p^=[[ZA)'XY'(ZA)'Y 
and  the  counterpart  under  small  misspecification  is 

'p\{e)  =  [[ZA)'xY'(ZA)'Y'{e) 


so  that 


Note  that 


and 


bA{e)     =  plim3.4(e)-/5 

=  p\\m[(ZA)' xY\za)'y' {e)-p 

=  plim  [{ZA)'  X] " '  [ZA)'  {X(i  +  Ze  +  e)-P 

=  /3  +  p\im[(ZA)' X]'YZA)' Z9-P 

=  p]im{A'Z'X]~'^  A'Z'Ze 

=  p]]m  \ A' z' z  {z' zy'^  z'x]    A'z'ze 

=  [A'^Tx)'^  A'^e 

—, =   {it  $7r)  IT  $ 


(9) 


Instead  of  dealing  with  an  awkward  normalization  involving  the  weight  matrix  $,  it  is  convenient  to 
use  assume  that  $  =  /.  We  then  have 

db2SLS  {9) 
39' 
and 


{^'^)-'^' 


^M..,-.- 


Remcirk  2  If  there  is  only  one  instrument,  then  — %gf  =  tt  ^  ■  Therefore,  small  t:  indicates  that  2SLS 
IS  sensitive  to  misspecification. 

Remark  3  //  there  are  multiple  components  in  n,  and  if  the  first  component  of  tt  is  small  relative  to 
other  components  ofn,  then  — 'gg^  would  be  small,  i.e.,  2SLS  is  not  very  sensitive  to  the  violation  of 
the  exclusion  restriction  in  z,  i. 


Remark  4  Note  that 


db2SLS  {9) 


and 


dbA  [9] 


de' 


89' 


{A'TTf^A'A{A'TTy'^ 


(ttV)- 


[A'Trf  ~  \\Af  \\nf        \m' 


db2SLS  (9) 


de' 


Therefore,  2SLS  is  the  most  robust  estimator  among  the  class  of  IV  estnnators  6,4. 


C      Higher  Order  Bias  of  a 

Our  model  is  gi\en  by 

y,     =     x,p  +  e^, 

■T,       =      f,+U,  =  z',TT+U,  1=  1,. . .  ,n 

where  (f,.u,)    is  homoscedastic  and  normal.  We  consider  the  2SLS 

x'Py 


P2SLS 


x'Px 


and  the  related  estimator  for  the  \-ariance  of  f, 


1     "  ■> 


We  have  the  following  characterization  of  o' 


5'  =  ii:('''^'(^2SLs-p)f 

1T=1 


n 


f'u    _   uu\  (-r  y 

[^2315  -  P) 


+     7/  +  2' 


where 


H  =  ^f'f  =  --j'Z'Zix 
n  n 


Lemma  3 


for 


T,     =     ^/'t  =  0,(l) 


n     =     -2{^]-h(^f'e]=0, 


n   J  H\  ^  )       ^P  U 

2 
77.     )      m    \^ 

Proof.  Note  that  2SLS  is  a  special  case  of  the  fc-class  estimator 

-  x'Py  -  k  ■  x'My 

^^  ~  x'Px  -  k  ■  x'Mx 
for 


^6     =     -2     V     77^^     =Op 


T.  -  2^^^\i,(^r.\.o(' 


n 

and  6  is  the  "eigenvalue" .    Note  that  2SLS  corresponds  to  a  =  0  and  6  =  0.    The  result  follows  from 
Donald  and  Newey  (1998).    ■ 
We  therefore  obtain 


Lemma  4 


a^     =     ol 


1  ^(e's  n\  2  (\^ 


1 


'  n 


Proof.  We  have 


Because 


T:     =     Op(l) 


and 


^    -   o.(i),       ^^4t,^oV^ 


n 


n         v^.  '  VV" 


^  =  °'(^)'      "" 


=   0, H-  .      —  =  o,(i) 


we  obtain 


^2  e  £ 

a       =     — 
n 


1    /..        u'u\    /  1  „\^  /I 


K\"^-      [tJ^        ^^^[n 


Now,  note  that 


V^(i£-aM=Op(i),       y;:^f£-^-a,..J=o,(i),       /^f^_a?,  )  =  o,(i) 


We  therefore  obtain 


al  +  -7=V^ 


V^         \  n 


o. 


e'e         2         1      /-  A'f         2 
—  =  r     '  "  ~ 

n 

3 


f    /  ' 


1 


;^--(^^E^.j+^v/^(v-"^'')(i 


H^')^«.' 


and 


It  follows  that 


n  \  n    I  \  H       I         n  H  n  H^  Vn 


?2       =       ^2^_Ly^('^_^2 


/n         V  n 


|--(^^e^^J"^v"(t-"" 


;|-U?^'    -^Kv-)(»-^^ 


1  Tf        1  a^T/  ^  1 


10 


a      =     a  c 


Vn 


+Ob 


-  (J.  \ 7=a^ 


H' 


,T, 


¥^^ji''^ 


2    r-  /  «^'" 
n         \   n 


1  r?   1  crin 

n  H       n    m 


Condition  1   Assume  that  we  can  ignore  the  Op  (^j  term  in  Lemma  4  in  calculation  of  expectation. 
Theorem  1 


E\a^\^ot-- 


2[K-2)al^       1,   ,    lolo 


2^2 


n  H 


K  + 


n    '       n    H 


where 


Proof.  From  Lemma  4,  we  have 


-2  2 

a       =     a^ 


1     ^  fe'e 
Jn         \  n 


H=  -f'f=  -n'Z'Zn 
n  n 


■?\         2  /  1  ^ 


2.,..(..,,..,)_£^(5;._„J(...)_l2,lz|2 


+  0r, 


Because  expected  values  of  the  Op  ( -j=  I   terms  in  the  second  line  are  zero,  it  suffices  to  consider  the 
Op  (-)  in  the  third  line.  First,  we  note  that 


E[T2]     =     E 
E[T^]     =     E 


u'Pe 


--Ka, 


\/n 


-?)M> 


from  which  we  obtain 


Second,  we  note  that 


2  /  1  ^        1  ^ 


==  — r^^ 


2{K-2)al 
n  H 


2         fe'u  W  1 

n         \    n  I  \  H 


due  to  symmetry.  Third,  we  note  that 


E  [rn  =  Ha] 


11 


from  which  we  obtain 


1 T2   1  oin 


2T-21 


n  H       n    m 


n  n    H 


We  therefore  obtain 


Eld 


'      n  H  n    '       n    H 


Remark  5  In  order  to  understand  Theorem  1,  imagine  a  counter-factual  situation  where  the  first  order 
asymptotic  approximation  for  ^/n  {P2SLS  ~  /^l  w;  exact,  i.e.,  write 


We  would  then  have 


V^{hsLS-P)  =  j^T, 


::?2  ^2 

a       =     (7, 


Jn         \  n 


—^Ofu       ttTi 


yn 


2     r-  /e'w 

-  —  v/n a 

n  \  n 


fu  '  I  Jj^'^ 


H 


n  H       n    //2 


+  Od 


and 


IP  i-~2|  _     2        1     2    ,1  ^^f^^ 


n    '       n    H 
Therefore,  Theorem  1  implies  that  the  approximate  Tnean  of  a     is  smaller  by 

2(/^-2)aL 
n  H 

than  would  be  expected  out  of  first  order  asymptotic  approximation. 

Remark  6    Theorem  1  can  be  understood  from  a  different  perspective.  Note  that  the  approximate  bias  of 
2SLS  IS  equal  to 

yjn  nH 

Roughly  speaking,  2SLS  is  biased  toward  OLS,  which  minimizes  -  ^"-j  {y,  —  x^b)     with  respect  to  h.   If 
the  2SLS  02SLS  ^^  close  to  the  OLS  f^oLS^  then  we  should  expect 

i=l  1  =  1  i=]  1=1 


12 


;   o  o  n 


\h 
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